Search CORE

146 research outputs found

Self-supervised automated wrapper generation for weblog data extraction

Author: A. Laender
B. Adelberg
C. Kohlschütter
I. Muslea
N. Kushmerick
P. Geibel
R. Baumgartner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Data extraction from the web is notoriously hard. Of the types of resources available on the web, weblogs are becoming increasingly important due to the continued growth of the blogosphere, but remain poorly explored. Past approaches to data extraction from weblogs have often involved manual intervention and suffer from low scalability. This paper proposes a fully automated information extraction methodology based on the use of web feeds and processing of HTML. The approach includes a model for generating a wrapper that exploits web feeds for deriving a set of extraction rules automatically. Instead of performing a pairwise comparison between posts, the model matches the values of the web feeds against their corresponding HTML elements retrieved from multiple weblog posts. It adopts a probabilistic approach for deriving a set of rules and automating the process of wrapper generation. An evaluation of the model is conducted on a dataset of 2,393 posts and the results (92% accuracy) show that the proposed technique enables robust extraction of weblog properties and can be applied across the blogosphere for applications such as improved information retrieval and more robust web preservation initiatives

Crossref

UCL Discovery

Warwick Research Archives Portal Repository

Intelligent Self-Repairable Web Wrappers

Author: A. Laender
B. Chidlovskii
E. Ferrara
K. Lerman
N. Kushmerick
P. Bille
R. Baumgartner
S. Sarawagi
S. Selkow
W. Yang
X. Meng
Y. Kim
Publication venue
Publication date: 01/01/2011
Field of study

The amount of information available on the Web grows at an incredible high rate. Systems and procedures devised to extract these data from Web sources already exist, and different approaches and techniques have been investigated during the last years. On the one hand, reliable solutions should provide robust algorithms of Web data mining which could automatically face possible malfunctioning or failures. On the other, in literature there is a lack of solutions about the maintenance of these systems. Procedures that extract Web data may be strictly interconnected with the structure of the data source itself; thus, malfunctioning or acquisition of corrupted data could be caused, for example, by structural modifications of data sources brought by their owners. Nowadays, verification of data integrity and maintenance are mostly manually managed, in order to ensure that these systems work correctly and reliably. In this paper we propose a novel approach to create procedures able to extract data from Web sources -- the so called Web wrappers -- which can face possible malfunctioning caused by modifications of the structure of the data source, and can automatically repair themselves.\u

arXiv.org e-Print Archive

Crossref

CogPrints Cognitive Sciences Eprint Archive

Applying semantic web technologies to knowledge sharing in aerospace engineering

Author: A. Arasu
A. Chakravarthy
A.-S. Dadzie
A.H.F. Laender
B. Rosenfeld
C. Manning
C. Preisach
D. Petrelli
F. Ciravegna
J. Broekstra
J. Hendler
J. Iria
J. Magalhães
J. Magalhães
J. Xu
M.R. Naphade
R. Bhagdev
S. Chapman
S. Gupta
V. Lanfranchi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/08/2009
Field of study

This paper details an integrated methodology to optimise Knowledge reuse and sharing, illustrated with a use case in the aeronautics domain. It uses Ontologies as a central modelling strategy for the Capture of Knowledge from legacy docu-ments via automated means, or directly in systems interfacing with Knowledge workers, via user-defined, web-based forms. The domain ontologies used for Knowledge Capture also guide the retrieval of the Knowledge extracted from the data using a Semantic Search System that provides support for multiple modalities during search. This approach has been applied and evaluated successfully within the aerospace domain, and is currently being extended for use in other domains on an increasingly large scale

CiteSeerX

Crossref

White Rose Research Online

Multiple Representations in Geographic Information Systems

Author: Clodoveu A. Davis Jr.
Laender Alberto H. F.
Publication venue
Publication date: 03/05/2022
Field of study

Geographic information systems (GIS) deal with data which can potentially be useful for a wide range of applications. However, the information needs of each application usually vary, specially in resolution, detail level, and representation style. This thesis presents a set of primitives that allow the specification of operational processes, such as transformations between representations, through the use of a dynamic schema.Sociedad Argentina de Informática e Investigación Operativ

Servicio de Difusión de la Creación Intelectual

Data driven Xpath generation

Author: A. Carlson
A.H. Laender
C.H. Chang
J. Myllymaki
K. Lerman
N. Dalvi
N. Dalvi
Q. Hao
T. Sugibuchi
V. Crescenzi
V. Levenshtein
W. Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

The XPath query language offers a standard for information extraction from HTML documents. Therefore, the DOM tree represen- tation is typically used, which models the hierarchical structure of the document. One of the key aspects of HTML is the separation of data and the structure that is used to represent it. A consequence thereof is that data extraction algorithms usually fail to identify data if the structure of a document is changed. In this paper, it is investigated how a set of tab- ular oriented XPath queries can be adapted in such a way it deals with modifications in the DOM tree of an HTML document. The basic idea is hereby that if data has already been extracted in the past, it could be used to reconstruct XPath queries that retrieve the data from a different DOM tree. Experimental results show the accuracy of our method

Crossref

Ghent University Academic Bibliography

Evolution in the number of authors of computer science publications

Author: A Laender
B Meyer
DA Patterson
DM Bennett
E Elmacioglu
HA Abt
J Freyne
J Solomon
J Wainer
JM Cavero
JM Cohoon
JM Fernandes
João M. Fernandes
M Franceschet
M Greene
Miguel P. Monteiro
MY Vardi
Y Gu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2017
Field of study

This article analyses the evolution in the number of authors of scientific publications in computer science (CS). This analysis is based on a framework that structures CS into 17 constituent areas, proposed by Wainer et al. (Commun ACM 56(8):67–63, 2013), so that indicators can be calculated for each one in order to make comparisons. We collected and mined over 200,000 article references from 81 conferences and journals in the considered CS areas, spanning a 60-year period (1954–2014). The main insights of this article are that all CS areas witness an increase in the average number of authors, in every decade, with just one slight exception. We ordered the article references by number of authors, in ascending chronological order and grouped them into decades. For each CS area, we provide a perspective of how many groups (1-author papers, 2-author papers and so on) must be considered to reach certain proportions of the total for that CS area, e.g., the 90th and 95th percentiles. Different CS areas require different number of groups to reach those percentiles. For all 17 CS areas, an analysis of the point in time in which publications with n+1 authors overtake the publications with n authors is presented. Finally, we analyse the average number of authors and their rate of increase.This work was supported by FCT - Fundação para a Ciência e Tecnologia within the Project Scope UID/CEC/00319/2013

Universidade do Minho: RepositoriUM

Crossref

Metal-macrofauna interactions determine microbial community structure and function in copper contaminated sediments

Author: A Almeida
A Turner
AE Laursen
Andrew J. Midwood
B Laverock
B Thornton
Barry Thornton
BB Jørgensen
C Alsterberg
C Durou
CM Hawkins
DA Roberts
Daniel J. Mayor
DC White
DE Jones
DG Petersen
DJ Mayor
DJ Mayor
DJ Mayor
DJ Mayor
DJ Mayor
DK Hebel
DL Breitburg
DM Yebra
E Zetsch
EG Bligh
F De Laender
G Neşer
H Neumann-Hensel
HTS Boschker
HU Riisgard
J Brinch-Iversen
J Taylor
JB Guckert
JC Nicholls
JCS Lu
JD Icely
JM Hayes
Joanna Elver-Evans
JR Kelly
K Chander
K Manimaran
K Petersen
K Sundbäck
KL Londry
L Bat
L Bat
Lucas J. Stal
M Conradi
MA Teece
MH Murdoch
N Singh
Nia B. Gray
Q Dortch
R Parks
R Richards
RA Festa
RB Jonas
RD Pancost
RJ Dean
S Schouten
SE Apitz
SJ Binnerup
SL Simpson
SP Pelegri
SP Pelegri
SR Carpenter
SW Nixon
TN Wiegner
TZ Lerch
V Gerdol
V Ochoa-Herrera
VJ Bertics
WP Barber
WP Porubsky
WW Gilbertson
Å Frostegård
Å Frostegård
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 31/05/2013
Field of study

Peer reviewedPublisher PD

Aberdeen University Research

Public Library of Science (PLOS)

Southampton (e-Prints Soton)

Crossref

Directory of Open Access Journals

PubMed Central

NERC Open Research Archive

FigShare

Dispersal syndromes in challenging environments: A cross‐species experiment

Author: Altermatt Florian
Ansart Armelle
Blanchet Simon
Bonte Dries
Chaine Alexis S
Cote Julien
Dahirel Maxime
De Laender Frederik
De Raedt Jonathan
Fronhofer Emanuel A
Haegeman Bart
Jacob Staffan
Kaltz Oliver
Laurent Estelle
Legrand Delphine
Little Chelsea J
Madec Luc
Manzi Florent
Masier Stefano
Pellerin Felix
Pennekamp Frank
Schtickzelle Nicolas
Therry Lieven
Vong Alexandre
Winandy Laurane
Publication venue: 'Wiley'
Publication date: 01/12/2022
Field of study

Dispersal is a central biological process tightly integrated into life-histories, morphology, physiology and behaviour. Such associations, or syndromes, are anticipated to impact the eco-evolutionary dynamics of spatially structured populations, and cascade into ecosystem processes. As for dispersal on its own, these syndromes are likely neither fixed nor random, but conditional on the experienced environment. We experimentally studied how dispersal propensity varies with individuals' phenotype and local environmental harshness using 15 species ranging from protists to vertebrates. We reveal a general phenotypic dispersal syndrome across studied species, with dispersers being larger, more active and having a marked locomotion-oriented morphology and a strengthening of the link between dispersal and some phenotypic traits with environmental harshness. Our proof-of-concept metacommunity model further reveals cascading effects of context-dependent syndromes on the local and regional organisation of functional diversity. Our study opens new avenues to advance our understanding of the functioning of spatially structured populations, communities and ecosystems. Keywords: context-dependent dispersal; dispersal strategy; distributed experiment; predation risk; resource limitatio

ZORA

Resilience trinity: Safeguarding ecosystem functioning and services across three different time horizons and decision contexts

Author: Auge H.
Baessler C.
Bennett E.M.
Berger U.
Bohn F.
Bonn A.
Borchardt D.
Brand F.
Bärlund I.
Chatzinotas A.
Corstanje R.
De Laender F.
Dietrich P.
Dunker S.
Durka W.
Fazey I.
Grimm V.
Groeneveld J.
Guilbaud C.S.E.
Harms H.
Harpole S.
Harris J.
Jax K.
Jeltsch F.
Johst K.
Joshi J.
Klotz S.
Kuhlicke C.
Kühn I.
Müller B.
Radchuk V.
Reuter H.
Rinke K.
Schmitt‐Jansen M.
Seppelt R.
Singer A.
Standish R.J.
Thulke H‐H
Tietjen B.
Weise H.
Weitere M.
Wirth C.
Wolf C.
Publication venue: 'Royal College of Obstetricians & Gynaecologists (RCOG)'
Publication date: 01/01/2020
Field of study

Ensuring ecosystem resilience is an intuitive approach to safeguard the functioning of ecosystems and hence the future provisioning of ecosystem services (ES). However, resilience is a multi‐faceted concept that is difficult to operationalize. Focusing on resilience mechanisms, such as diversity, network architectures or adaptive capacity, has recently been suggested as means to operationalize resilience. Still, the focus on mechanisms is not specific enough. We suggest a conceptual framework, resilience trinity, to facilitate management based on resilience mechanisms in three distinctive decision contexts and time‐horizons: 1) reactive, when there is an imminent threat to ES resilience and a high pressure to act, 2) adjustive, when the threat is known in general but there is still time to adapt management and 3) provident, when time horizons are very long and the nature of the threats is uncertain, leading to a low willingness to act. Resilience has different interpretations and implications at these different time horizons, which also prevail in different disciplines. Social ecology, ecology and engineering are often implicitly focussing on provident, adjustive or reactive resilience, respectively, but these different notions of resilience and their corresponding social, ecological and economic tradeoffs need to be reconciled. Otherwise, we keep risking unintended consequences of reactive actions, or shying away from provident action because of uncertainties that cannot be reduced. The suggested trinity of time horizons and their decision contexts could help ensuring that longer‐term management actions are not missed while urgent threats to ES are given priority

Research Repository

Static Analysis and Query Answering for Incomplete Data Trees with Constraints

Author: A. Laender
C. David
G. Gottlob
H. Björklund
J.W. Thatcher
L. Libkin
L. Segoufin
L. Segoufin
M. Arenas
M. Arenas
M. Bojanczyk
P. Barceló
P. Buneman
T. Imieliński
W. Fan
W. Martens
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Crossref

Edinburgh Research Explorer